Out of Core FFTs in a Parallel Application Environment
نویسنده
چکیده
A principle mission at the NAS facility is to establish highly parallel computer systems supporting full scale production use by 1996. In order to fulfill this objective, parallel systems must support high speed scalable I/O -suitable for handling output of large scale numerical aerodynamic simulation. Pursuant to this goal, we seek to execute an 'out of core' radix 2 Fast Fourier Transform (FFT) as rapidly as possible. By 'out of core' we mean that the size of the problem to be solved is too large to fit in normal memory. We implement out of core methods on each of two computer architectures: the CM5 with a scalable disk array (SDA), and the Intel iPSC/860 with a Concurrent File System (CFS). In the case of the out of core FFT, the most successful I/O approach known is to apply an intermediate transpose of the data when viewed as a two dimensional matrix [2]. We implement and evaluate three I/O methods for performing the required transpose. The first method (row/col) transposes by exchanging rows and columns of the data set, the second (diagonal) exchanges diagonals, and the third (recursive) applies a recursive divide and conquer approach. Only the recursive method can be feasibly implemented on the CM5 SDA because it does not support 'independent access' -the ability for each processor to maintain an independent pointer to a given file on disk. We discover that in absolute terms, the CM5 SDA, using the recursive method, outperforms the iPSC/860 CFS using any of the three methods in solving out of core FFTs. However, the recursive method shows a rapid decline in performance with problem size. The iPSC/860 CFS, using the diagonal method or the row/col method scales well with problem size. We conclude that if the CM5 SDA had the independent access provided by the iPSC/860 CFS, then the CM5 SDA would scale as well as the iPSC/860 CFS, but at a much higher level of absolute performance.
منابع مشابه
Multidimensional, Multiprocessor, Out-of-Core FFTs with Distributed Memory and Parallel Disks
We show how to compute multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do not fit in the memory of the entire system. Instead, data reside on a parallel disk system and are brought into memory in sections. We use the Parallel Disk Model for implementation and analysis. Our method is a straightforwar...
متن کاملTwo Algorithms for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs
We describe two algorithms for computing multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do not fit in the memory of the entire system. Instead, data reside on a parallel disk system and are brought into memory in sections. We use the Parallel Disk Model for implementation and analysis. The first me...
متن کاملPerforming Out-of Core FFTs on Parallel Disk Systems
The Fast Fourier Transform (FFT) plays a key role in many areas of computational science and engineering. Although most one-dimensional FFT problems can be solved entirely in main memory, some important classes of applications require out-of-core techniques. For these, use of parallel I/O systems can improve performance considerably. This paper shows how to perform one-dimensional FFTs using a ...
متن کاملA Hybrid MPI/OpenMP 3D FFT for Plane Wave First-principles Materials Science Codes
First principles electronic structure calculations based on a plane wave expansion of the wavefunctions are the most commonly used approach for electronic structure calculations in materials and nanoscience. In this approach the electronic wavefunctions are expanded in Fourier components and 3D FFTs are used to construct the charge density in real space. Efficient parallel 3D FFTs are required ...
متن کاملA Portable 3D FFT Package for Distributed-Memory Parallel Architectures
1 I n t r o d u c t i o n Multidimensional FF’I’s are used frequently in engineerillg and scientific calculations, especially in image processing. Parallel implementations of FFT generally follow two approaches. One is the binary-exchange approach[l ,2], where data exchanges take place in all pairs of processors with processor numbers differing by one bit. Another one is the transpose approach[...
متن کامل